Skip to content

Conversation

@quic-sanising
Copy link
Contributor

@quic-sanising quic-sanising commented Sep 4, 2025

📢 Expanded On-Device Sampling Support in QEfficient

Excited to share that On-Device Sampling—previously available only for LlamaForCausalLM—is now supported across a broader set of architectures! This enhancement brings faster, more efficient inference directly to the QAIC device.

✅ Newly Supported Architectures:

  1. FalconForCausalLM
  2. GemmaForCausalLM
  3. GPT2LMHeadModel
  4. GPTJForCausalLM
  5. GraniteForCausalLM
  6. GraniteMoeForCausalLM
  7. LlamaForCausalLM (existing)
  8. MptForCausalLM
  9. Phi3ForCausalLM
  10. Qwen2ForCausalLM

⚠️ Architectures Still Pending Support:

  1. GPTBigCodeForCausalLM
  2. InternVLChatModel
  3. MistralForCausalLM
  4. MixtralForCausalLM
  5. LlamaSwiftKVForCausalLM
  6. Grok1ModelForCausalLM

We’re actively working to extend support to these models. Contributions, feedback, and testing from the community are always welcome to help accelerate this effort!

quic-sanising and others added 30 commits June 18, 2025 13:38
Signed-off-by: quic-sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
@quic-sanising quic-sanising changed the title Extend On Device Sampling Support to more Causal Language Models Extend On-Device Sampling Support to more Causal Language Models Sep 4, 2025
@quic-sanising
Copy link
Contributor Author

Depends on PR #463.

@quic-sanising quic-sanising changed the base branch from main to ods-unit-tests September 4, 2025 20:26
@quic-sanising quic-sanising changed the base branch from ods-unit-tests to main September 4, 2025 20:26
@quic-hemagnih
Copy link
Contributor

Have we tested these models for On-Device Sampling? Have you added test cases in the CI for these models?

@quic-hemagnih
Copy link
Contributor

Also please rebase it, post your testing confirmation we can go ahead and merge this PR

Copy link
Contributor

@quic-hemagnih quic-hemagnih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm -

  1. have you tested the newly added models for On Device Sampling
  2. Please rebase
  3. Can you add few models in the CI

@quic-sanising
Copy link
Contributor Author

  1. Yes, I have tested it locally and the feature works for each of the 10 architectures mentioned above.
  2. Rebase done.
  3. Afaik for the CI, you guys prefer lightweight models. And maybe for each new architecture, we can add one model config that can be tested (similar to Tinyllama model configs for LlamaForCausalLM architecture in the current CI). If this sounds good, can you please provide me a list of models that you want to be tested? Also, we need to keep in mind that for each new model config, we would need to add 2 sets of ground truth: one for greedy sampling and one for random sampling. Let me know how you want to proceed.

@quic-hemagnih quic-hemagnih merged commit 35d8fd8 into quic:main Nov 1, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants